NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bias Reducing Multitask Learning on Mental Health Prediction

https://doi.org/10.1109/ACII55700.2022.9953850

Zanna, Khadija; Sridhar, Kusha; Yu, Han; Sano, Akane (October 2022, 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII))

Full Text Available
More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced Modality of Wearable Sensors

https://doi.org/10.1109/EMBC48229.2022.9871472

Yang, Huiyuan; Yu, Han; Sridhar, Kusha; Vaessen, Thomas; Myin-Germeys, Inez; Sano, Akane (July 2022, 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC))

Full Text Available
An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System

https://doi.org/10.1109/TASLP.2024.3507562

Lin, Wei-Cheng; Sridhar, Kusha; Busso, Carlos (January 2024, IEEE/ACM Transactions on Audio, Speech, and Language Processing)

It is difficult to achieve robust and well-generalized models for tasks involving subjective concepts such as emotion. It is inevitable to deal with noisy labels, given the ambiguous nature of human perception. Methodologies relying on semi-supervised learning (SSL) and curriculum learning have been proposed to enhance the generalization of the models. This study proposes a novel deep mutual information (DeepMI) metric, built with the SSL pre-trained DeepEmoCluster framework to establish the difficulty of samples. The DeepMI metric quantifies the relationship between the acoustic patterns and emotional attributes (e.g., arousal, valence, and dominance). The DeepMI metric provides a better curriculum, achieving state-of-the-art performance that is higher than results obtained with existing curriculum metrics for speech emotion recognition (SER). We evaluate the proposed method with three emotional datasets in matched and mismatched testing conditions. The experimental evaluations systematically show that a model trained with the DeepMI metric not only obtains competitive generalization performances, but also maintains convergence stability. Furthermore, the extracted DeepMI values are highly interpretable, reflecting information ranks of the training samples.
more » « less
Full Text Available
Generative Approach Using Soft-Labels to Learn Uncertainty in Predicting Emotional Attributes

https://doi.org/10.1109/ACII52823.2021.9597461

Sridhar, Kusha; Lin, Wei-Cheng; Busso, Carlos (September 2021, International Conference on Affective Computing and Intelligent Interaction (ACII 2021))

Full Text Available
Deepemocluster: a Semi-Supervised Framework for Latent Cluster Representation of Speech Emotions

https://doi.org/10.1109/ICASSP39728.2021.9414035

Lin, Wei-Cheng; Sridhar, Kusha; Busso, Carlos (June 2021, IEEE international conference on acoustics, speech and signal processing (ICASSP 2021))
null (Ed.)
Semi-supervised learning (SSL) is an appealing approach to resolve generalization problem for speech emotion recognition (SER) systems. By utilizing large amounts of unlabeled data, SSL is able to gain extra information about the prior distribution of the data. Typically, it can lead to better and robust recognition performance. Existing SSL approaches for SER include variations of encoder-decoder model structures such as autoencoder (AE) and variational autoencoders (VAEs), where it is difficult to interpret the learning mechanism behind the latent space. In this study, we introduce a new SSL framework, which we refer to as the DeepEmoCluster framework, for attribute-based SER tasks. The DeepEmoCluster framework is an end-to-end model with mel-spectrogram inputs, which combines a self-supervised pseudo labeling classification network with a supervised emotional attribute regressor. The approach encourages the model to learn latent representations by maximizing the emotional separation of K-means clusters. Our experimental results based on the MSP-Podcast corpus indicate that the DeepEmoCluster framework achieves competitive prediction performances in fully supervised scheme, outperforming baseline methods in most of the conditions. The approach can be further improved by incorporating extra unlabeled set. Moreover, our experimental results explicitly show that the latent clusters have emotional dependencies, enriching the geometric interpretation of the clusters.
more » « less
Full Text Available
Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities

https://doi.org/10.1109/MSP.2021.3105939

Lee, Chi-Chun; Sridhar, Kusha; Li, Jeng-Lin; Lin, Wei-Cheng; Su, Bo-Hao; Busso, Carlos (November 2021, IEEE Signal Processing Magazine)

Full Text Available
Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech

https://doi.org/10.1109/ICASSP40776.2020.9054237

Sridhar, Kusha; Busso, Carlos (May 2020, IEEE international conference on acoustics, speech and signal processing (ICASSP 2020))

A challenging task in affective computing is to build reliable speech emotion recognition (SER) systems that can accurately predict emotional attributes from spontaneous speech. To increase the trust in these SER systems, it is important to predict not only their accuracy, but also their confidence. An intriguing approach to predict uncertainty is Monte Carlo (MC) dropout, which obtains pre- dictions from multiple feed-forward passes through a deep neural network (DNN) by using dropout regularization in both training and inference. This study evaluates this approach with regression models to predict emotional attribute scores for valence, arousal and dom- inance. The analysis illustrates that predicting uncertainty in this problem is possible, where the performance is higher for samples in the test set with lower uncertainty. The study evaluates uncertainty estimation as a function of the emotional attributes, showing that samples with extreme values have lower uncertainty. Finally, we demonstrate the benefits of uncertainty estimation with reject option, where a classifier can decline to give a prediction when its confi- dence is low. By rejecting only 25% of the test set with the highest uncertainty, we achieve relative performance gains of 7.34% for arousal, 13.73% for valence and 8.79% for dominance.
more » « less
Full Text Available
Role of Regularization in the Prediction of Valence from Speech

https://doi.org/10.21437/Interspeech.2018-2508

Sridhar, Kusha; Parthasarathy, Srinivas; Busso, Carlos (September 2018, Interspeech 2018)

Regularization plays a key role in improving the prediction of emotions using attributes such as arousal, valence and dominance. Regularization is particularly important with deep neural networks (DNNs), which have millions of parameters. While previous studies have reported competitive performance for arousal and dominance, the prediction results for valence using acoustic features are significantly lower. We hypothesize that higher regularization can lead to better results for valence. This study focuses on exploring the role of dropout as a form of regularization for valence, suggesting the need for higher regularization. We analyze the performance of regression models for valence, arousal and dominance as a function of the dropout probability. We observe that the optimum dropout rates are consistent for arousal and dominance. However, the optimum dropout rate for valence is higher. To understand the need for higher regularization for valence, we perform an empirical analysis to explore the nature of emotional cues conveyed in speech. We compare regression models with speakerdependent and speaker-independent partitions for training and testing. The experimental evaluation suggests stronger speaker dependent traits for valence. We conclude that higher regularization is needed for valence to force the network to learn global patterns that generalize across speakers.
more » « less
Full Text Available

Search for: All records